AITopics

Neural Information Processing SystemsFeb-18-2026, 13:03:53 GMT

e7eb8128eb26eafbe901348df1dbacdc-Paper-Conference.pdf

dataset, detection, information, (16 more...)

Country:

North America > United States (0.04)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
Caspian Sea (0.04)
(3 more...)

Genre: Research Report > Experimental Study (0.93)

Industry:

Energy (0.49)
Transportation (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Neural Information Processing SystemsFeb-11-2026, 06:57:43 GMT

Large Language Models ' Expert-level Global History Knowledge Benchmark (HiST-LLM)

Neolithic period to the Industrial Revolution and includes information reviewed and assembled by history experts and graduate research assistants.

large language model, machine learning, natural language, (19 more...)

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
Europe > Austria > Vienna (0.14)
Oceania (0.05)
(29 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Law (1.00)
Government (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsOct-10-2025, 20:07:22 GMT

e7eb8128eb26eafbe901348df1dbacdc-Paper-Conference.pdf

dataset, detection, information, (16 more...)

Country:

North America > United States (0.04)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)
Caspian Sea (0.04)
(3 more...)

Genre: Research Report > Experimental Study (0.93)

Industry:

Energy (0.49)
Transportation (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Neural Information Processing SystemsOct-9-2025, 23:33:32 GMT

38cc5cba8e513547b96bc326e25610dc-Paper-Datasets_and_Benchmarks_Track.pdf

absent reasoning and evidence, inferred absent, knowledge, (14 more...)

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > Washington > King County > Seattle (0.14)
Europe > Austria > Vienna (0.14)
(31 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Law (1.00)
Government (0.92)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceMar-27-2025

JEEM: Vision-Language Understanding in Four Arabic Dialects

Kadaoui, Karima, Atwany, Hanin, Al-Ali, Hamdan, Mohamed, Abdelrahman, Mekky, Ali, Tilga, Sergei, Fedorova, Natalia, Artemova, Ekaterina, Aldarmaki, Hanan, Kementchedjhieva, Yova

We introduce JEEM, a benchmark designed to evaluate Vision-Language Models (VLMs) on visual understanding across four Arabic-speaking countries: Jordan, The Emirates, Egypt, and Morocco. JEEM includes the tasks of image captioning and visual question answering, and features culturally rich and regionally diverse content. This dataset aims to assess the ability of VLMs to generalize across dialects and accurately interpret cultural elements in visual contexts. In an evaluation of five prominent open-source Arabic VLMs and GPT-4V, we find that the Arabic VLMs consistently underperform, struggling with both visual understanding and dialect-specific generation. While GPT-4V ranks best in this comparison, the model's linguistic competence varies across dialects, and its visual understanding capabilities lag behind. This underscores the need for more inclusive models and the value of culturally-diverse evaluation paradigms.

caption, large language model, machine learning, (20 more...)

2503.2191

Country:

Asia > Middle East > Jordan (0.25)
Asia > Thailand > Bangkok > Bangkok (0.04)
Asia > Singapore (0.04)
(16 more...)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment (1.00)
Health & Medicine (1.00)
Transportation > Ground (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)

Almaoui, Perla Al, Bouillon, Pierrette, Hengchen, Simon

Arabizi vs LLMs: Can the Genie Understand the Language of Aladdin?

arXiv.org Artificial IntelligenceFeb-28-2025

In this era of rapid technological advancements, communication continues to evolve as new linguistic phenomena emerge. Among these is Arabizi, a hybrid form of Arabic that incorporates Latin characters and numbers to represent the spoken dialects of Arab communities. Arabizi is widely used on social media and allows people to communicate in an informal and dynamic way, but it poses significant challenges for machine translation due to its lack of formal structure and deeply embedded cultural nuances. This case study arises from a growing need to translate Arabizi for gisting purposes. It evaluates the capacity of different LLMs to decode and translate Arabizi, focusing on multiple Arabic dialects that have rarely been studied up until now. Using a combination of human evaluators and automatic metrics, this research project investigates the model's performance in translating Arabizi into both Modern Standard Arabic and English. Key questions explored include which dialects are translated most effectively and whether translations into English surpass those into Arabic.

arabic, dialect, translation, (11 more...)

2502.20973

Country:

Africa > Middle East > Algeria (0.06)
Asia > Middle East > Lebanon (0.05)
Africa > Middle East > Egypt > Cairo Governorate > Cairo (0.04)
(11 more...)

Genre: Research Report (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)

arXiv.org Artificial IntelligenceOct-30-2024

Evaluating Cultural and Social Awareness of LLM Web Agents

Qiu, Haoyi, Fabbri, Alexander R., Agarwal, Divyansh, Huang, Kung-Hsiang, Tan, Sarah, Peng, Nanyun, Wu, Chien-Sheng

As large language models (LLMs) expand into performing as agents for real-world applications beyond traditional NLP tasks, evaluating their robustness becomes increasingly important. However, existing benchmarks often overlook critical dimensions like cultural and social awareness. To address these, we introduce CASA, a benchmark designed to assess LLM agents' sensitivity to cultural and social norms across two web-based tasks: online shopping and social discussion forums. Our approach evaluates LLM agents' ability to detect and appropriately respond to norm-violating user queries and observations. Furthermore, we propose a comprehensive evaluation framework that measures awareness coverage, helpfulness in managing user queries, and the violation rate when facing misleading web content. Experiments show that current LLMs perform significantly better in non-agent than in web-based agent environments, with agents achieving less than 10% awareness coverage and over 40% violation rates. To improve performance, we explore two methods: prompting and fine-tuning, and find that combining both methods can offer complementary advantages -- fine-tuning on culture-specific datasets significantly enhances the agents' ability to generalize across different regions, while prompting boosts the agents' ability to navigate complex tasks. These findings highlight the importance of constantly benchmarking LLM agents' cultural and social awareness during the development cycle.

agent, user query, website, (15 more...)

2410.23252

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Asia > China (0.05)
Asia > Middle East > Iran (0.05)
(21 more...)

Genre:

Research Report (0.64)
Workflow (0.46)

Industry:

Law (1.00)
Government > Immigration & Customs (0.67)
Information Technology > Services > e-Commerce Services (0.36)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.33)

arXiv.org Artificial IntelligenceOct-22-2024

Arabic Dataset for LLM Safeguard Evaluation

Ashraf, Yasser, Wang, Yuxia, Gu, Bin, Nakov, Preslav, Baldwin, Timothy

The growing use of large language models (LLMs) has raised concerns regarding their safety. While many studies have focused on English, the safety of LLMs in Arabic, with its linguistic and cultural complexities, remains under-explored. Here, we aim to bridge this gap. In particular, we present an Arab-region-specific safety evaluation dataset consisting of 5,799 questions, including direct attacks, indirect attacks, and harmless requests with sensitive words, adapted to reflect the socio-cultural context of the Arab world. To uncover the impact of different stances in handling sensitive and controversial topics, we propose a dual-perspective evaluation framework. It assesses the LLM responses from both governmental and opposition viewpoints. Experiments over five leading Arabic-centric and multilingual LLMs reveal substantial disparities in their safety performance. This reinforces the need for culturally specific datasets to ensure the responsible deployment of LLMs.

government, large language model, machine learning, (20 more...)

2410.1704

Country:

Asia > Thailand > Bangkok > Bangkok (0.04)
Asia > Singapore (0.04)
North America > Mexico > Mexico City > Mexico City (0.04)
(11 more...)

Genre: Research Report (0.40)

Industry:

Government (1.00)
Law Enforcement & Public Safety (0.93)
Law > Civil Rights & Constitutional Law (0.70)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

arXiv.org Artificial IntelligenceAug-26-2024

Students' Perceived Roles, Opportunities, and Challenges of a Generative AI-powered Teachable Agent: A Case of Middle School Math Class

Song, Yukyeong, Kim, Jinhee, Liu, Zifeng, Li, Chenglu, Xing, Wanli

Ongoing advancements in Generative AI (GenAI) have boosted the potential of applying long-standing "learning-by-teaching" practices in the form of a teachable agent (TA). Despite the recognized roles and opportunities of TAs, less is known about how GenAI could create synergy or introduce challenges in TAs and how students perceived the application of GenAI in TAs. This study explored middle school students' perceived roles, benefits, and challenges of GenAI-powered TAs in an authentic mathematics classroom. Through classroom observation, focus-group interviews, and open-ended surveys of 108 sixth-grade students, we found that students expected the GenAI-powered TA to serve as a learning companion, facilitator, and collaborative problem-solver. Students also expressed the benefits and challenges of GenAI-powered TAs. This study provides implications for the design of educational AI and AI-assisted instruction.

genai-powered ta, learning, student, (12 more...)